(N 

o 

C 

00 



•4— > 



> 

o 
in 

00 

o 

(N 



Technical Report # KU-EC-12-1: 
New Cell-Specific and Overall Tests of Spatial Interaction Based on 
Nearest Neighbor Contingency Tables 

Elvan Ceyhan * 
March 7, 2013 



Abstract 

Spatial interaction patterns such as segregation and association can be tested using nearest neighbor 
contingency tables (NNCTs). We introduce new cell-specific (or pairwise) and overall segregation tests 
and determine their asymptotic distributions. In particular, we demonstrate that cell-specific tests enjoy 
asymptotic normality, while overall tests have chi-square distributions asymptotically. We also perform 
an extensive Monte Carlo simulation study to compare the finite sample performance of the tests in terms 
of empirical size and power. In addition to the cell-specific tests as post-hoc tests for overall tests, we 
discuss one-class-versus-rest type of NNCT-tests after an overall test yields significant interaction. We 
also introduce the concepts of total, strong, and partial segregation/association to label levels of these 
patterns. We compare these new tests with the existing NNCT-tests in literature with simulations as well 
and illustrate the NNCT-tests on an ecological data set. 

Keywords: Association; completely mapped data; complete spatial randomness; random labeling; segregation 



* corresponding author. 

e-mail: elceyhan@ku.edu.tr (E. Ceyhan) 



X 



1 Introduction 

Multivariate clustering patterns such as segregation and association result from multivariate interaction be- 
tween two or more classes (or species). For convenience, categories of the points or units are referred to as 
"classes", e.g., a class can stand for species, sex, or some other characteristic of the unit/subject. Segregation 
is the spatial pattern in which points from the same class are closer to each other, while association is the 
pattern in which points from different classes are closer to each other. These patterns may hav e impo rtant 
implications in ecology, p lant biology, or epidemiology. See, for example, IWhipple (|l98(t . lDigglel (|2003h . and 
Irlamill and Wright ( 19861 ). In particular, in ecology, two tree species could be highly dependent on each other 
(as a result of, say, symbiosis or mutualism), and thus, coexist in a close vicinity (i.e., they are associated), 
or they could be enjoying the company of conspccifics and so form one-class clumps or groups (i.e., they 
are segregated). In epidemiology, cases might be clustered compared to controls, due to infectious nature 
of a disease or closeness to a source of the disease (i.e., cases and controls are segregated). In a social con- 
text, seg regation of res idences due to the socioeconomic status or ethnicity can be investigated by generative 
models (jFossettl (120 111)). In literature, spatial s egregation is also used to refer to a univaria t e pat tern of 
spatial clustering (jRobertson and Cushing (<201ll) K which is referred to as aggregation (jCevhanl (120081) ). In a 



social network, segregation of individuals are also modeled via random graph theoretical tools (jHenrv et al 



(|2011l )). In veterinary epidemiology, a nonparametric method for detectin g spatial segregatio n according to 
the genotype and year of occurrence of bovine tuberculosis is employed by iDiggle et al.l (|2005l ) . 



'Department of Mathematics, Koc University, Sanyer, 34450, Istanbul, Turkey. 



Many (univariate or multivariate) spatial clustering tests are p roposed in lite rature (see Kulldorfll ( 2006) for 

an ext ensive review). These methods inclu de Ripley's if - function (|Riplevl (|2004l) ). or J- function (jvan Lieshout and Baddelev 
( 1999f) ). nearest neig hbor (NN) methods (|Diggld (j2003h ) and so on. Am ong NN methods, this article con- 
cerns the nearest neighb o r con tingency tables (NNCTs). iPieloul (|196ll ) introduced various tests based on 
NNCTs, however, Dixonl ( 19941 ) extended the se tests in various di rections, and also determined the correct 
asymptotic distribution of the proposed tests. ICevhanl ( 2008 . 2009t ) compared NNCT-tests in literature, and 
also proposed various tests based on NNCTs. 

In this article, we introduce various new cell-specific segregation tests and overall tests b a sed on thes e 
cell-specific te sts. We compare these tests with the existing NNCT-tests in literature (jDixonl (|1994 120021 ) . 
Cevhanl d2010l) L Wc demonstrate that cell-specific tests are asymptotically normal, and overall tests tend to 
chi-square tests with the corresponding degrees of freedom. In practice, cell-specific tests serve as post-hoc 
tests to be performed when an overall test yields a significant result. As an alternative post-hoc test after 
a significant overall test, we discuss one-class-versus-rest type of NNCT-tests. By extensive simulations, we 
compare the newly proposed tests to the ones in literature in terms of empirical size and power, and so 
demonstrate that one type of the new cell-specific tests and overall tests perform better compared to other 
tests. 

We describe the NNCTs and provide the null and alternative patterns in Section^ provide the cell-specific 
tests in Section [3l overall tests in Section [4] empirical significance levels in the two- and three-class cases in 
Sections [5] and H3 respectively, and empirical power analysis under segregation and association in the two-class 
and three-class cases in Sections [7] and HI respectively. We present the empirical size and power analysis for 
the one-vs-rest type testing in the three-class case in Section the illustration on the example data set in 
Section [TUJ and our conclusions and guidelines for using the tests in Section [TT1 



2 Null and Alternative Spatial Patterns and NNCTs 

We describe the spatial point patterns for two classes only; the extension to multi-class case is straightforward. 
Our null hypothesis is 

H Q : randomness in the NN structure 

which may result from random labeling (RL) or independence of points from two classes. Under independence, 
the two classes result independently from the same stochastic process, hence their spatial distribution is 
identical. In this article, among independence patterns we will only consider complete spatial randomness 
(CSR) of points from two classes. Roughly, under CSR independence, two classes are independently uniformly 
distributed in a region of interest, while RL is the pattern in which, given a fixed set of points in a region, 
class labels are assigned to these fixed points randomly so that the labels are independent of the locations. 

As alternatives, we consider two major types of spatial patterns: segregation and association. Segregation 
occurs if the NN of an i ndividual is m ore likely to be of the same class as the individual than to be from a 



different class (sec, e.g., IPieloul (jl96ll) ). Association occurs if the NN of an individual is more likely to be 
from another class than to be of the same class as the individual. These patterns are not symmetric, e.g., for 
two classes, one class might be more associated with another class. For example, plant species X could be 
more dependent on species Y, so occurs in close vicinity of Y plants, while the reverse relation may not be in 
the same level or type. Also, class X points might exhibit a stron ger clustering, compared to class Y points, 



and so might be more segregated compared to class Y points. See ICevhanl (|2010f ) for more detail on the null 
and alternative patterns. 

NNCTs are cons tructed using t he NN frequencies of classes. The construction of NNCTs for two classes 



is described, e.g., in ICevhanl (|2008l ). here we provide a brief description for q > 2 classes. Suppose there are 
q classes labeled as {1,2,..., q}. NNCTs are constructed using NN frequencies for each class. Let Ni be the 
number of points from class i for i G {1, 2, . . . , q} and n = Y^i=i N. If we record the class of each point and 
its nearest neighbor, the NN relationships fall into q 2 categories: 

(1,1), (1,2),..., (l,q); (2,1), (2, 2), . . . , (2, q); . . . , (q, q) 



where in category (i,j), class i is the base class, while class j is the class of the NN. Denoting iVy as the 
observed frequency of category for i,j 6 {1,2, ... ,q}, we obtain the NNCT in Table Q] where Cj is 



NN class 





class 1 


class q 


total 


class 1 

base class 


N u . 


• N lq 


ni 


class q 


N ql . 


• N„ 


n q 


total 




. c q 


n 



Table 1: The NNCT for q classes. 



the sum of column j; i.e., number of times class j points serve as NNs for j 6 {1,2,..., q}. Note also that 
n = - Nij, rii — Y^j=i Nij, and Cj = X)|=i -^j- I 11 ce ^ J ) , class i is called the base class, and class j is 
called the NN class. Here we adopt the convention that variables denoted by upper case letters are random 
quantities, while variables denoted by lower case letters fixed quantities. Thus, in NNCT-analysis, row sums 
are assumed to be fixed (i.e., class sizes are given), while column sums are assumed to be random and depend 
on the NN relationship between the classes. 

Under CSR independence or RL, cell counts would be close to their expected values, while under 
segregation the diagonal counts Nu w ould be large r, while under association the off-diagonal counts Nij 
would be larger than expected. When IPieloul (|196ll) developed NNCT-tests, she used Pearson's x 2 test of 
independenc e for testing segregation which is not appropriate due to the dependence structure in a NNCT. 
Dixonl (1994) derived the correct asymptoti c distribution o f cell counts and hence the appropriate test which 
also has a x 2_ distribution asymptotically. Ceyhanl (2008) determines the conditions when Pielou's test is 
appropriate, and when Dixon's test is appropriate, and discusses their use in practice. 



2.1 Total, Strong, and Partial Segregation and Association 

When H Q is rejected, if the diagonal entries (Nu values) tend to be higher than expected, there is (positive) 
segregation; if the off-diagonal entries are larger than expected, there is association. These types of patterns 
are easy to detect for q = 2 classes, but for q > 2, rejecting H Q only indicates that there is some sort of 
deviation from the null case, but with many possible directions, since rejecting H a only implies that for some 
class i, there exists classes that are more likely to serve as NN to class i or less likely to serve as NN to class 
i than expected under H Q . Let itij be the probability of a point from class j serving as NN to a point from 
class i given that the NN points is from class j (i.e., it is a conditional probability). For example, for a fixed 
class i, if nu > T^iji then we have total segregation of class i from other classes; that is, class i is more 

likely to have a same class NN than all other classes combined. If nu > 7r,j for all i j, then we have strong 
segregation, which implies that class i is more likely to have a con-specific NN compared to all other classes 
one at a time. Notice that total segregation implies strong segregation. 

For fixed classes i and j, if 7r,j > Ylk^j ^ik then we have total association of class j with class i; that is, 
class j is more likely to be a NN of class i than all other classes combined. If 7Ty- > iiik for all k ^ j, then we 
have strong association of class j with class i, which implies that class j is more likely to be a NN of class i 
compared to all other classes one at a time. Notice that total association implies strong association. 

On the other hand, if ttu > mj for all j 6 Si C {1,2, ... ,q} \ {i} and ttu < Ttij for all j G S2 = 
{1, 2, . . . , q} \ (Si U {i}), then we say class i is more segregated from the classes in Si and more associated with 
the classes in Sa- Such cases are called partial segregation of class i with respect to classes in Si and partial 
association of class i with classes in 52. 



3 Cell-Specific Segregation Tests 



We describe cell-specific segregation tests of Dixon and introduce new cell-specific tests labeled as type I-IV 
cell-specific tests, henceforth. 



3.1 Dixon's Cell-Specific Segregation Tests 



Dixon's cell-specific tests are used to measure the devia tion of observed co unt in cell in a NNCT from 
its expected value under H described in detail in, e.g., iDixonl (|1994 120021 ). The test statistic suggested by 
Dixon for cell (i,j) is given by 

vD _N ij - E[i\y 



13 y/VarlNij] ' 

where E[iVy] is the expected cell count and ~Var[Nij] is the variance of cell count Nij. 

For q > 2 classes, under RL or CSR independence, the expected cell count for cell is 



(1) 



- l)/(n 
rii rij/(n - 1) 



1) if< = j, 
if i ± j, 



(2) 



where rii is the fixed sample size for class i for i = 1, 2, . . . , q. Observe that the expected cell counts depend 
only on the size of each class (i.e., row sums), but not on column sums. And the variance is 



Var[JVy] = 



(n + R)pu + (2n — 2R + Q)pm + (n 2 — 3 n - Q + R)puu - (npu) 2 if i = j, 



npij + Qpuj + (n — 3n - Q + R)pujj — (npijY 



if i + j, 



(3) 



with p xx , Pxxxi and Pxxxx are the probabilities that a randomly picked pair, triplet, or quartet of points, 
respectively, are the indicated classes and are given by 



Pu 

Piii 
Piiii 



rij (rij - 1) 

n (n — 1) ' 
n, (n t - 1) (rij - 2) 

n (n - 1) {n - 2) ' 
rt t (rij - 1) (rij - 2) (rij - 3) 

n(n-l)(n- 2) (n - 3) ' 



Piij 



n (n — 1) ' 

n t (n. t - 1) 
?i (n - 1) (n - 2) ' 

rti (rij - 1) re.,- (rij - 1) 
n (n - 1) (ra - 2) (n - 3) ' 



(4) 



Furthermore, i? is twice the number of reflexive pairs and Q is the number of points with shared NNs, which 
occurs when two or more points share a NN. Then Q = 2 (Q2 + 3 Q3 + 6 Q4 + 10 Q5 + 15 Q§) where Qk is the 
number of points that serve as a NN to other points k times. 



3.2 Type I Cell-Specific Segregation Tests 

In standard cases like multinomial sampling for contingency tables with fixed row totals and conditioning on 
the column totals, Cj = Cj, the expected cell count for cell in contingency tables is E[iVij] = We 
first consider the difference — "^ J - for cell However under RL, 2Vj = rij are fixed, but Cj are random 

quantities and Cj = J2t=i Nij, hence we suggest as the first type of cell-specific segregation test as 



m Cj 



Then under RL, 



For all j, E[Cj] = rij, since 



T — N 



E K1 = R^n", ', J „ ' (5) 



(n-l) 



fE[Q] if i^j. 



B[Cj]-LW- (n _ 1} +2.(^ri)- {n _ 1) + j^Tj2 f 



n 7 -(rij - 1) n,- . . 
(n-l) (n-l) V Jl J 



Therefore, 



E [ T ij\ = 



"'("'-") if i = n 

n(n-l) 11 ' — 7, 
n i if j j 



(6) 



For the variance of T£- . we have 



Var = Var[A^. 



77 77 ■ 

^Var[C J ]-2^Cov[iV lJ ,C J ] 



(7) 



where Var[7V y ] are as in Equation ©, Var[Cj] = Yn=\ Var [^i.j]+Efc^ V, Cov X,,. .V/,.,; and Cov[JV tJ ■, C,- 
ELi Cov[iVjj, jVjy] with Cov[Nij,Nki] are as in Equations (4)-(12) of lDixonl (|2002l) . 

As a new cell-specific test, we propose 

2£-E[2g] 



Var [2$ 



(8) 



3.3 Type II Cell-Specific Segregation Tests 

In Section [3721 we suggested JVjy — "* J as the test statistic for cell However, under RL, E[Cj] = rij, 

so we suggest as the second type of segregation test as 

T" - N- ■ - niHj 



Then under RL, E [J 1 ^ 7 ] = E [T^] which is provided in Equation ([6]). Moreover, the variance of is 
Var [Ty ] = Var[JVy], since n,, n 3 - and n are fixed. 



As a cell-specific test, we propose 



z u = hi M^>J , (9) 

•/Var[^] 



3.4 Type III Cell-Specific Segregation Tests 

In the previous sections, E [T 7 -] = E [T-j 1 ] ^ under RL. Hence, instead of these test statistics, in order to 
obtain expected value for our test statistic, we suggest the following: 

T m = [ N u- jt=TJ C * if * = 3, (1Q) 

Then E [Tff 1 ] = 0, since, for i = j, 

E 2^' = E [Nu] - ^— { E [Q] = 7 / - 7^7^ = 0, 
(n — 1) (n — 1) [n — 1) 

(n»-l) r , n,-7ij (n* - 1) 



and for i / j, 



E [3™] = E[iVjj] - ^H^E[C 



(n-1) L JJ (n-1) (n-1) 



As for the variance of T// 1 , we have 



Var [Tf 11 ] = { Yar[Nl3] + T^ Var[Cj] ~ 2 fe^ CoV ^'' C A if 1 = '> (n) 
Var [N tJ ] + ^Var [Cj] - 2^ ly Cov[7V ij , Q] if i ^ j. 



As a new cell-specific test, we propose 



rpIII 

Zl" = , U (12) 



Var [Tff 1 ] 



Notice that this is same as the new cell-specific test introduced in (jCevhan and details of this 

test are provided for the sake of completeness. 

3.5 Type IV Cell-Specific Segregation Tests 

For Tjj 11 , we introduced a coefficient in front of the second term, i.e., ni Cj /n, to obtain a zero expected value 
for our statistic under RL. In this section, we modify the first term and obtain the following test statistic: 



Ni 



T IV = < "t"'- 1 ) 3 n 1 ~ n V"*" 1 3 3 J J ' 



Then E [Tjy] = 0, since, for i = j, 



r ., , TTir^A rii f n - 1 n^rii - 1) 



n Vn, — 1 L " J L 'V n Vn,- — 1 n — 1 

and for i / j, 

lb I V \ I V _L 

As for the variance of , we have 

Var m = J S (ferW^l + Var[Q] ~ 2^Cov[A/ M , Q]) if < = ^ 

£ ((n - l) 2 Var[JV <3 -] + n?Var[Q] - 2(n - lKCov[A^, Q]) if i ^ j. 



As a new cell-specific test, we propose 



rpIV 

Z™ = . 13 (15) 



4 Overall Segregation Tests 

In this section, we describe the overall segregation tests in literature and introduce new overall tests based 
on cell-specific tests in Section [3l 



4.1 Dixon's Overall Segregation Test 



In the multi-class case with q classes, combining the q 2 cell-specific tests in Section EP1 iDixon ( 2002 ) suggests 
the quadratic form to obtain the overall segregation test as follows: 



C D = (N-E[N])'S^(N-E[N]) 



(16) 



where N is the q 2 x 1 vector of q rows of the NNCT concatenated row-wise, E[N] is the vector of E[iVjj] 
which are as in Equation £.d is the q 2 x q 2 variance-covariance matrix for the cell count vector N with 
diagonal entries equal to Var[iV,j] and off-diagonal entries b eing Coy\Na , N^i) for ^ (k, I). The explicit 
forms of the varianc e and covariance terms are provided in (Dixon) ( 20021 )). Also, is a generalized inverse 
of (|Searld (120061) 1 and ' stands for the transpose of a vector or matrix. Then under RL, C'd has a \ , 
distribution asymptotically. 



4.2 Type I Overall Segregation Test 



We can also combine the type I cell-specific tests of Section I3~2"1 Let T 1 be the vector of q 2 values, i.e., 

rr-rl / nnj rpl rpl rpl rpl rpl rpl \ ' 

1 — X, 1 11' 1 12' • ■ • ' 1 l<ji J 21' J 22> • ■ • l 1 2qi • ■ • I 1 qq) ' 

and let E [T 1 ] be the vector of E [T^] values. Note that E [T 1 ] = (e [T/ x ] , E [T/ 2 ] , . . . , E [T/J , E [T/J , E [T 2 7 2 ] 



E [T 2(? ] , . . . , E [T 7 J J . Hence to obtain a new overall segregation test, referred to as type I overall test, we 
use the following quadratic form: 



Ci = (T 1 - E [T 1 ])' £7 (T 1 - E [T 1 ] ) 

where £/ is the q 2 x q 2 variancc-covariance matrix of T 1 . 



(17) 



Under RL, the diagonal entries in the variance-covariance matrix S/ are Var [Tj£] which are provided in 
Equation (J7]). For the off-diagonal entries in £/, i.e., Cov [T^,T^] with ^ (k, I), there are four cases to 
consider: 

case 1: i = j and k = I, then 



Cov[3&I&] =Cov 



jV« -Ci,Nkk -Cfc 

n 11 

Cov[N u ,N kk ] - ^Cov[N lu C k ] - -Cov[N kk ,d] + ^Cov[C is C k ]. (18) 
n n n 



case 2: i = j and k ^ I, then 



Cov = Cov 



N,, 



n n 

Cov[N Ul N kl ] - mCav[Nn,Ci] - -Cov[iV fci ,C- 
n n 



riirik 



Cav[d,Ci]. (19) 



case 3: i ^ j and k = I, then Cov [7^,7]^] = Cov [T^ k ,T.^ , which is essentially case 2 above, 
case 4: i ^ j and k ^ I, then 



Cov[l£,T fc ' z ] =Cov 



■CjjNfci -Ci 



nk. 
n 



Cov[N i:i> Nki} - — Cov[iV y -,G] - -Cov[iV fei ,C,] 
n n 



riiUk 



n- 



Cov[C 3 ,d}. (20) 



4.3 Type II Overall Segregation Test 

We also combine the type II cell-specific tests of Section I3~3l Let T 11 be the vector of q 2 values, i.e., 

rpll _ I rpl I rpl I rpl I rpl I rpl I rpl I T 7/ V 

1 — X, 1 11 i 1 12 l • ■ • l 1 lq ' J 21 ! 1 22 I ■ • ■ ' 1 2q ' ■ • ■ ' 1 qq J ' 

and let E [T 11 ] be the vector of E [T^ 7 ] values. As the type II overall segregation test, we use the following 
quadratic form: 

C N = (T 11 - E [T n ])'E7 / (T n - E [T 11 ]) (21) 

where £ // is the q 2 x q 2 variance-covariance matrix of T n . 

Under RL, the diagonal entries in the variance-covariance matrix E^r are Var [T/j 1 ] which are same as 
Var[i\y. For the off-diagonal entries in I]//, i.e., Cov [T?/ t Ttf] with ^ (k,l), we have Cov [T?/ ,Ttf] = 
CoviNij - ^,N kl = Cov[JV y> JV H ]. 



4.4 Type III Overall Segregation Test 



When we combine the type III cell-specific tests of Section IBT41 we obtain type III overall test as follows. Let 
T m be the vector of q 2 T^ 11 values, i.e., 

rpIII _ IrpIII rpIII rpIII rpIII rpIII rpIII T /7/ V 

— l/ll i J 12 T--i 1 lq i 1 21 > J 22 >■■■■> - L 2q ! ■ • ■ ) 1 qq ) ) 

and let E [T ln ] be the vector of E [T?/ 1 ] values. Note that E [T ln ] = 0. As the type III overall segregation 
test, we use the following quadratic form: 



Cm = (T 111 ) ' T l J II (T m ) 
where E /// is the q 2 x q 2 variance-covariance matrix of T ln . 



(22) 



Under RL, the diagonal entries in the variance-covariance matrix Ejjj are Var [T^ 7/ ] which are provided 
in Equation (fTTj) . For the off-diagonal entries in E/77, i.e., Cov [T// 1 , T^ 11 ] with 7^ (k, /), there are four 
cases to consider: 
case 1: i = j and k = l, then 



Cov [Ti/ 1 , Til 1 ] = Cov 



Nu — -7-^ — -^-Ci.Nkk — K— — 7T-Ck 



(n - 1) 



(n-1) 



- ^T-^Cov[JV«,i7 fc ] - ^-^Cov[AT fefe ,q] + {ni ,} ){n X 1] Cov^Cu]. (23) 



(n-1) 

case 2: i = j and k I, then 



(n-1) 



(71 - l) 5 



Cov[T^,T/"l =Cov 



(n,- — 1) „ iii. „ 



(n-1) 



(n-1) 



Cov[JV«, N kl ] - 7 -^Cov[N ii ,C l ] - ^JlcovlNkuCi] + ( "' ^ Cov^.C,]. (24) 



(n - 1) 



(n-1) 



(n - l) 2 



case 3: i ^ j and fc = I, then Cov [Tf/ 1 ,T^] = CavfT^*, T^ 7 ], which is essentially case 2 above, 
case 4: i ^ j and fc ^ I, then 



Cov[^,T^]=Cov 



(n-1) 



(n-1) 



Cov[JV y , JV«] - — l^-Cov^y, Q] - — ^— Cov[iV fci , + "*" fc Cov[C 3 -,C t ]. (25) 
(n — 1) (n — 1) (n — l) z 

Note that type III overall segregation is same as the new overall test provided in ( Cevhanl ( 201dh ). 



4.5 Type IV Overall Segregation Test 

When we combine the type IV cell-specific tests of Section l3~5"l we obtain type IV overall test as follows. Let 
T IV be the vector of q 2 values, i.e., 

rpIV _ \rpIV rpIV rpIV rpIV rpIV rpIV rpIVl ' 

L — L J H ' J 12 T--T 1 lq i 1 21 i J 22 i ■ • ■ 1 1 2q )• ■ • ! 1 qq \ i 

and let E [T IV ] be the vector of E [T^ v ] values. Note that E [T IV ] = 0. As the type IV overall segregation 
test, we use the following quadratic form: 

C /V =(T IV )%>(T IV ) (26) 
where Ejy is the q 2 x q 2 variance-covariance matrix of T IV . 



Under RL, the diagonal entries in the variance-covariance matrix Ejy are Var [T^ v ] which are provided 
in Equation (fT4]l . For the off-diagonal entries in Ejy, i.e., Cov [T/^ , T^] with ^ (k,l), there are four 
cases to consider: 
case 1: i = j and k = I, then 



Cov [TiyXk] = Cov 



riifn-l \ n k ( n - 1 (n k - 1) 

iVj, — O, , | ; T^L-fc 



n V fii — 1 



7i \n k - 1 



Cov 



ri — 1 „ n — 1 



(n-1) 
(«fc ~ 1) 



n, ; - 1 



rife - 1 



nitik 



(n-1) 



(n-1) 



(n-1) 
(n-1) 



(rij 



— ^ -Cov[iV l4 ,iV fcfc ] - -^-Cov^Cfc] - J ^Cov[JVfcfc,Ci] + Cov[Ci,C fc ] 

l)(n fe -l) (n» - 1) ("A; - 1) 



(27) 



case 2: i = j and fc ^ Z, then 

71^ f n — 1 



Cov[i;7,T fe 7]=cov 



.^-Q , - ((n - 1)JV« - n fc C,) 
n \ fc - 1 / n 



rii „ 
— ^ Cov 



n-1 



n, - 1 



Nu-Ci^n-^Nki-nkCi 



(n-1) 5 



Cov[N u ,N kl } - 



(n - l)n fc 
K - 1) 



Cov[iV M ;, C z ] -{n- l)Cov[JV w , d] + n k Cov[Ci,Ci] . 



(28) 



case 3: i ^ j and k = I, then Cov [Z^ v , = Cov[T^, 2y ], which is essentially case 2 above, 
case 4: i ^ j and k ^ I, then 



Cov [2^ v ,T fc 7] = Cov 



i ((n - l)iVy - mCj) , i ((n - 1)JV« - n fc Cj) 
-2 Cov [(n - 1)% - mQ, (n - l)JV fcl - n k Ci] = 



— ((n - l) 2 Cov[JV tf , JV W ] - (n - l)n fc Cov[JV tfj C,] - (n - l)n<C0v[iV w , Cj] + ti^Cdy^-, C,]) . (29) 



4.6 Remarks on NNCT-Tests 



Under RL, is shown to have iV(0, 1) distribution asymptotically, while for q > 2 the asymptotic normality 
of the off-diagonal cells in NNCTs is not rigorousl y estab l ished , although extensive Monte Carlo simulations 
indicate approximate normality for large samples ( Dixon (2002))). Furthermore, Dixon's cell specific test and 
type 11 cell-specific test are equivalent, and so are types III and IV cell-specific tests. 

Asymptotically, under RL Co and Cu has a Xq( q —i) distribution since rank of Y*o and E// is q{q — 1), 
in fact, E^> = E/j. Cm has a xf q -x)3 distribution since rank of E//j is (q — l) 2 . However, our Monte Carlo 
simulations suggest that Cj has xt distribution where v = (q — l) 2 — 1/2, while Cjy has x 2 distribution 
where v = q 2 — 2. Furthermore, the cell-specific test statistics are dependent, hence their squares do not sum 
to the corresponding overall segregation tests. Among the overall tests, Dixon's test and type II overall test 
are identical (the small differences in practice are due to rounding errors in computations). 



In all the above cases, Cov[Nij,N k i] are as in iDixonl (|2002l) . CovfTVjj, C\ 



Cov[C i5 Cj] = ELi ELi Cav[JV fci , 



El=iC°v{N tJ ,N kl ] and 



Under CSR independence, the cell-specific and overall tests are as in RL case. However, under RL, Q and 
R are fixed quantities, as they depend only on the location of the points, but not the types of NNs, while 
under CSR independence, they are random. Under CSR independence, the distribution of the test statistics 
above is similar to the RL case. The only difference is that the new cell-specific tests asymptotically have 



N(Q,1) distribution conditional on Q and R. Hence, under CSR independence, Var[7V,j], Cov[Nij,Nki], 
Cov[Nij, Cfc], Cov[Ci, Cj], and all other quantities depending on Q and R are conditiona l on Q and R. The 
unconditional variances can be obtained by replacing Q and R with their expectations (see lCevhanl ( 20101) for 



more detail). Since Q and R are random under CSR independence, the variances of the test statistics tend 
to be larger compared to the ones under RL. 

Each of the cell-specific tests measures the deviation of the test statistic from its expected value under 
H a . Dixon's and type II cell-specific tests depend on Nij (i.e., cell counts) and row sums only, and types 
I, III, and IV cell-specific tests incorporate column sums as well. For the cell-specific tests, the z-score for 
cell indicates the level and direction of spatial interaction between classes i and j. If the z-score for 

cell is significantly larger (less) than zero, then class i exhibits (lack of) segregation from other classes. 
If the z-score for cell with i ^ j is significantly larger (less) than zero, then class j exhibits (lack of) 
association with class i. Moreover, for cells with i ^ j, the cell-specific tests are not symmetric. For 
example, the cell-specific test for cell may exhibit a different level of interaction compared to the cell 
(J, i). The overall tests combine cell-specific tests in one compound summary statistic. The performance of 
cell-specific tests are expected to carry over to the overall tests, provided the correct degrees of freedom is 
determined. 



Re call that in the two-class case, each cell count TVy has asymptotic normal distribution IjCuzick and Edwards 
( 1990h ). Hence, the new cell-specific tests Zjj, Z(,J , Z^ 1 and Z\J also converges in law to N(0, 1) as n — > oo. 



Moreover, one and two-sided versions of these tests are also possible. In the two-class case, usually only two 
cells contain all the information provided by the NNCT. In particular, segregation of class i from class j 
implies lack of association between classes i and j (i ^ j) and lack of segregation of class i from class j implies 
association between classes i and j (i ^ j). For Dixon's cell-specific test, we have Zf x = —Z^ for i = 1,2. 
For type I cell-specific test, Z[ x = Z\ 2 = —Z{ 2 = —Z\ x \ and for type II cell-specific test, we have Z® = Z(j] 

for type III cell-specific test, we have Z[j T = —Z\y for j = 1,2; and for type IV cell-specific test, we have 
yin — yiv 

In the multi-class case, a positive z-score for the diagonal cell (i, i) indicates segregation, but it does not 
necessarily mean lack of association between class i and class j (i ^ j), since it could be the case that class i 
could be associated with one class, yet not associated with another one. See also Section 12.11 

The cell-specific and overall tests are all consiste nt under both segregation and association alternative, 



which can be shown with the same mechanism as in ([Cevhanl (|2008[ )) 



4.7 Post-hoc Tests after Overall Tests: Class-Specific, Pairwise, and One-vs-rest 
Type Tests 

In our construction of the NNCT-tests, although we first introduce the cell-specific tests and then develop 
overall tests based on the cell-specific tests, in practice it is more natural to conduct the tests in reverse order. 
That is, first an overall NNCT-test should be performed, and if significant, then one can perform cell-specific 
tests to determine the types and levels of the spatial interaction patterns between the classes. This procedure 
is in fact analogous to ANOVA F-test to compare multiple groups, in the sense that if the F-test yields a 
significant result, then one performs pairwise tests to determine which pairs are different. 

However, NNCT-tests provide more alternatives (compared to ANOVA F-test) as post-hoc tests after an 
overall test is significant. In the multi-class case, when an overall test is rejected; i.e., there is evidence in 
favor of some sort of deviation from randomness of the spatial pattern, the next natural question is what 
type of deviation occurs for each class (or species). To this e nd, one can c ondu ct severa l post- hoc tests. One 
type of post-hoc tests is the class-specific tests discussed in Dixon ( 20021 ) and lCevhan ( 2009h . For pairwise 



comparison of the interaction between classes, one can resort to two options: (i) in a q x q NNCT, one can 
consider cell-specific tests for each cell (which also provides interaction of the class with itself on the diagonal 
cells) and (ii) one can restrict attention to the pair of classes i, j with i ^ j one at a time and conduct the 
tests as in the two-class case with a 2 x 2 NNCT. We recommend the approach in (i), since it incorporates all 
the classes in question and provides the types of interaction in the presence of all classes, while the approach 
in (ii) ignores the possible confounding effects of classes different from the pair in question. This in practice 
might not give the exact picture of the mixed relationships between all the classes. 



As another alternative post-hoc procedure. iDixonl (|2002f ) suggested the following. For class j, we pool the 



remaining classes and treat them as the other class in a two-class setting. Then we apply the two-class tests 
to the resulting NNCT. To emphasize the difference, this version of the class-specific test is called one-vs-rest 
type test. For q > 2 classes, let Af be the q x q NNCT with cell counts being and let Af be the 2x2 NNCT 
for the one-versus-rest type procedure with cell counts being Nij . When we are performing a one-versus-rest 
type testing for class i, without loss of generality, we can assign the first row in Af to class i and the second 
row to the rest. Then Nu = Nu, Ni 2 = J2j^i N ij, N 2i = J2j^i N ji: and N 22 = Y,j^i,k& N jk- Hence in 
the one-versus-rest type testing, the cell-specific test for cell (1, 1) in Af would be same as the cell-specific 
test for cell (i, i) in Af. Therefore, to extract information from Af that is not provided by AT, we consider the 
cell-specific tests for cell (2, 2) in Af. The overall test statistics for Af are also different than the ones for Af. 

In a multi-class case with q > 2 classes, there are q class-specific and one-vs-rest types of tests and 
(2) = 9(9 — pairwise tests and q 2 cell-specific tests. As q increases the class-specific tests are less 
intensive computationally and easier to interpret, whereas the pairwise tests might yield conflicting results. 



5 Empirical Significance Levels in the Two-Class Case 

We provide the empirical significance levels for Dixon's and the new cell-specific and overall segregation tests 
in th e two-class case under CSR independence and RL patterns. Our Monte Carlo simulation set-up is same 
as in Cevhan ( 2010h . 



5.1 Empirical Significance Levels under CSR Independence of Two Classes 

For CSR independence pattern, in the two-class case, we label the classes as X and Y, or class 1 and class 
2, interchangeably. We generate n\ points from class X and n 2 points from class Y both of which arc inde- 
pendent of each other and independently uniformly distributed on the unit square, (0, 1) x (0, 1). We use the 
sample size combinations (n u n 2 ) E {(10, 10), (10, 30), (10, 50), (30, 30), (30, 50), (50, 50), (50, 100), (100, 100)} 
and perform N mc = 10000 replications. The empirical sizes are calculated as the ratio of number of significant 
results to the number of Monte Carlo replications, N mc . We use .05 as our nominal significance level. 

We present the empirical significance levels for the NNCT-tests in Figure [TJ The empirical sizes signifi- 
cantly smaller (larger) than .05 are deemed as conservative (liberal). The asymptotic normal approximation 
to proportions arc used in determining the significance of the deviations of the empirical sizes from the nom- 
inal level of .05. For these proportion tests, we also use a = .05 to test against empirical size being equal 
to .05. With N mc = 10000, empirical sizes less than .0464 are deemed conservative, greater than .0536 arc 
deemed liberal at a = .05 level. These thresholds are indicated as the horizontal lines in Figure Q] Note 
also that the sample sizes are arranged in the increasing order for the first and then the second entries. The 
size values for discrete sample size combinations are joined by piecewise straight lines for better visualization. 
Let afp ajj-alj are the empirical significance levels of Dixon's and the cell-specific tests of types I- IV, 
respectively, old is for Dixon's and Sj-S/y is for the overall segregation tests. Notice that in the two-class 
case af^ = a® 2 an d & 2 ,i = oi 2>2 , since Nm = n\ — Nu and N 2 \ = n 2 — N 22 . Likewise for ajj-ajj . So we only 
present cell-specific tests for cells (1, 1) and (2,2). Furthermore, since Dixon's cell specific test and type II 
cell-specific test are equivalent, and so are types III and IV cell-specific tests, we only present Dixon's, types 
I and III cell-specific tests. 

For cell (1, 1), Dixon's cell-specific test has empirical size close to the nominal level of 0.05 for balanced 
sample sizes (i.e., for n\ n 2 or when relative abundance of classes are similar), while for unbalanced sample 
sizes, it tends to be liberal or conservative. On the other hand, types I and III cell-specific tests are less 
severely affected by the differences in relative abundances of the classes, i.e., they are closer to the nominal 
level for all sample size combinations. For cell (2,2), Dixon's cell-specific test is much closer to 0.05 for all 
sample size combinations, while types I and III cell-specific tests have similar performance as in cell (1,1). 
Thus, Dixon's cell-specific test has much better empirical size performance for the diagonal cell corresponding 
to the class with larger size, while types I and III cell-specific tests have better size performance for the 
diagonal cell corresponding to the class with smaller size. 



Empirical Size Plots for the NNCT- Tests under CSR Independence of Two Classes 




Figure 1: The empirical size estimates of the cell-specific tests for cells (1,1) (left), cell (2,2) (middle), and 
overall segregation test (right) under the CSR independence pattern in the two-class case. The horizontal 
lines are located at .0464 (upper threshold for conservativeness) , .0500 (nominal level), and .0536 (lower 
threshold for liberalncss) . The horizontal axis labels: 1=(10,10), 2=(10,30), 3=(10,50), 4=(30,30), 5=(30,50), 
6=(50,50), 7=(50,100), 8^(100,100). The legend labeling: D= Dixon's, 1= type 1, 111= type III, and IV^typc 
IV cell-specific or overall tests. 



For the overall tests, Dixon's test and type II test are very similar in size performance (hence only Dixon's 
test is presented). Type I overall test is conservative for smaller samples, while it has the desired level for 
larger samples. Dixon's, types I and III are closer to the nominal level, while they are slightly conservative 
for smaller samples. Type IV overall test has the most unstable behavior, it is extremely liberal for balanced 
sample sizes, while has the desired level for unbalanced sample sizes. In general Dixon's, type I, or type III 
overall tests are recommended in the two class case. 



5.2 Empirical Significance Levels under RL of Two Classes 

For the RL pattern we consider three case s, in each of w hich, we first determine the locations of points and 
then assign labels to them randomly. See ICevhan (l2010h for more detail. We generate n\ points iid U{S\) 



and ri2 points iid U^Sz) for the same combinations of n\,ri2 as in CSR independence case. The locations of 
these points are taken to be the fixed locations for which we assign the labels randomly. For each sample size 
combination (77,1,77,2), we randomly choose n\ points (without replacement) and label them as X points and 
the remaining 77,2 points as Y points. We repeat the RL procedure N mc = 10000 times for each sample size 
combination. Empirical sizes are estimated as in the CSR independence case. 

In RL case (1), we have Si = S 2 = (0, l)x(0, 1) (i.e., the unit square), in RL case (2), Si = (0, 2/3) x (0, 2/3) 
and S 2 = (1/3,1) x (1/3,1), and in RL case (3), Si = (0,1) x (0,1) and S 2 = (2,3) x (0,1). 

The locations for which the RL procedure is applied in RL cases (l)-(3) are plotted in Figure [5] for 
n\ = n 2 = 100. Observe that in RL case (1). the set of points are iid U((0, 1) x (0, 1)), i.e., it can be assumed 
to be from a Poison process in the unit square. The set of locations are from two overlapping clusters in RL 
case (2), and from two disjoint clusters in RL case (3). 

We present the empirical significance levels for the NNCT-tests under the RL cases (l)-(3) in Figure |3l 
Under RL cases (l)-(3), for cell (1,1), type I and III cell-specific tests are closer to the desired size compared 
to Dixon's cell-specific tests, which is severely conservative when the cell size is small. For cell (2,2), type I 
and III have similar performance as in cell (1,1), while due to the increase in expected cell counts, Dixon's 
test is closer to desired size. In both cells, the tests are about the desired level for larger sample sizes. For 
the overall test, Dixon's test and type II tests have very similar size estimates (hence only Dixon's test is 
presented). Type IV overall test is severely liberal for balanced sample sizes, while the other tests seem to be 
closer to the desired level, with type III overall test having the best performance. 



RLCase(1) 



RL Case (2) 



RL Case (3) 




Figure 2: The fixed locations for which RL procedure is applied for RL cases (l)-(3) with n\ = n 2 = 100 in 
the two-class case. Notice that x-axis for RL case (3) is differently scaled. 

6 Empirical Significance Levels in the Three-Class Case 

In this section, we provide the empirical significance levels for Dixon's and the new overall and cell-specific 
segregation tests in the three-class case under RL and CSR independence patterns. 



6.1 Empirical Significance Levels under CSR Independence of Three Classes 

The symmetry in cell counts for rows in Dixon's cell-specific tests and columns in the new cell-specific tests 
occur only in the two-class case. To better evaluate the performance of the cell-specific and overall tests, we 
also consider the three-class case. In the three-class case, we label the classes as X, Y, and Z or class 1, 
class 2, and class 3 interchangeably. We generate n\, n 2l points distributed independently uniformly on 
the unit square (0, 1) x (0, 1) from these classes. We use 

(m, n 2 , n 3 ) G {(10, 10, 10), (10, 10, 30), (10, 10, 50), (10, 30, 30), (10, 30, 50), (30, 30, 30), (10, 50, 50), 

(30, 30, 50), (30, 50, 50), (50, 50, 50), (50, 50, 100), (50, 100, 100), (100, 100, 100)}; 

and N mc = 10000. The empirical sizes and the significance of their deviation from .05 are calculated as in 
Section 15.11 

We present the empirical significance levels for the cell-specific tests in Figure |4] and for the overall tests 
in Figure [3 For the cell-specific tests, clearly, type I and III tests are closer to the desired level, and are less 
affected by the differences in sample sizes. On the other hand, Dixon's test is extremely liberal or conservative, 
when sample sizes arc very different (which may result in smaller expected cell counts). For the overall tests, 
Dixon's, type II and III are closer to the nominal level, with type III being the closest. On the other hand, 
type I and IV have opposite performances, in the sense that when one is liberal the other is conservative and 
vice versa. In particular, type IV (type I) is liberal (conservative) when sample sizes arc very different and 
conservative (liberal) for similar sample sizes. 



6.2 Empirical Significance Levels under RL of Three Classes 

We also perform Monte Carlo simulations under RL for the three class case to compare the tests without 
conditioning on Q and R. Under RL, we consider two cases, in each of which we first determine the locations 
of the points, and then assign the labels randomly. We generate n\ points iid U(S\), n 2 points iid U(S 2 ), and 
713 points iid U{S^,) for each combination of m, n 2 , as in CSR independence. The locations of these points 
are taken to be the fixed and we assign the labels randomly. For each sample size combination (m, n 2 , n^) we 
pick n\ points (without replacement) and label them as X, pick n 2 points from the remaining points (without 
replacement) and label them as Y points, and label the remaining 77,3 points as Z points. We estimate the 



Empirical Size Plots for the NNCT- Tests for Two Classes under RL Case (1) 



cell (1,1) cell (2,2) overall 




12345678 12345678 12345678 



RL Case (2) 



cell (1,1) cell (2,2) overall 




12345678 12345678 12345678 



RL Case (3) 



cell (1,1) cell (2,2) overall 




12345678 12345678 12345678 



Figure 3: The empirical size estimates of the cell-specific tests for cells (1,1) (left) and (2,2) (middle) and 
overall segregation test (right) under the RL cases (l)-(3) in the two-class case. The horizontal lines, axis 
labels, and legend labeling are as in Figure [T] 



Empirical Size Plots for the Cell-Specific Tests under CSR Independence 



cell (1,1) cell (1,2) cell (1,3) 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



cell (2,1) cell (2,2) cell (2,3) 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



cell (3,1) cell (3,2) cell (3,3) 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



Figure 4: The empirical size estimates of the cell-specific tests for cells (1,1) — (3,3) under the CSR 
independence pattern in the three-class case. The horizontal lines and legend labeling are as in Figure 
[Q The horizontal axis labels are: 1=(10,10,10), 2^(10,10,30), 3=(10,10,50), 4=(10,30,30), 5=(10,30,50), 
6=(30,30,30), 7=(10,50,50), 8^(30,30,50), 9=(30,50,50), 10=(50,50,50), 11=(50,50,100), 12=(50, 100,100), 
13=(100,100,100). 



Empirical Size Plots for the Overall Tests under CSR Independence 

overall (3 classes: CSR) 




2 4 6 8 10 12 



Figure 5: The empirical size estimates of the overall tests under the CSR independence pattern in the 
three-class case. The horizontal lines, and legend labeling are as in Figure [Hand axis labels are as in Figure 

m 

empirical size estimates based on N mc = 10000 replications for each sample size combination as in the CSR 
independence case. 

In RL case (1), we take Si = S 2 = S 3 = (0,1) x (0,1), and in RL case (2), Si = (0,1) x (0,1), 
S2 = (2, 3) x (0, 1), and S3 = (1, 2) x (2, 3). The locations for which the RL procedure is applied in RL cases 
(1) and (2) are plotted in Figure [6] for m = «2 = "3 = 100. In RL case (1), the locations of the points can 
be assumed to be from a Poisson process in the unit square. In RL case (2), the locations of the points are 
from three disjoint clusters. 

We present the empirical significance levels under RL case 1 in Figure [7] and under RL case 2 in Figure 
[5] Under both RL cases, type I and III cell-specific tests perform better in terms of empirical size (i.e., their 
empirical sizes are closer to the desired level) and less severely affected by smaller cell counts and unbalanced 
sample sizes, compared to Dixon's cell-specific test. Empirical sizes for the overall tests under RL cases 1 and 
2 are presented in Figure [SJ Dixon's, type I and III cell-specific tests are closer to the nominal level with type 
III being the closest. Furthermore, type I and IV exhibit the performance opposite to one another as in the 
CSR independence case. 



7 Empirical Power Analysis in the Two-Class Case 

We consider three cases for each of segregation and association alternatives in the two-class case. 



7.1 Empirical Power Analysis under Segregation of Two Classes 

Under the segregation alternatives, we generate Xi ~ U{Si) and Yj *~ U(S2) where Si = (0, 1 — s) x (0, 1 — s) 
and 5*2 = (s, 1) for i = 1, . . . , ni and j = 1, . . . , n<i and s £ (0, 1). We consider the following three segregation 
alternatives: 

ff|:s = l/6, H 1 / : s = 1/4, and H 1 / 1 : s = 1/3. (30) 

Notice that, the level of segregation increases as s increase; that is, segregation gets stronger from Hg to 
Hg 11 . We calculate the power estimates using the asymptotic critical values based on the standard normal 
distribution for the cell-specific tests and the corresponding ^-distributions for the overall tests. 

The power estimates based on the asymptotic critical values are presented in Figure 1101 We omit the 



RLCase(1) RLCase(2) 




0.0 0.2 OA 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 

x coordinate x coordinate 



Figure 6: The fixed locations for which RL procedure is applied for RL cases (1) and (2) with n\ = n 2 = 
ri3 = 100 in the two-class case. Notice that #-axis for RL case (2) is differently scaled. 

power estimates of the cell-specific tests for cells (1, 2) and (2, 1), since they would be same as cells (1, 1) and 
(2,2) (but in the reverse direction). As expected, the power estimates increase as segregation gets stronger 
and also as sample size increases. For the cell-specific and overall tests, type I and III tests have the highest 
power estimates. Dixon's and type II have very similar but lower power estimates, and the lowest power 
estimates are with type IV overall test. 

7.2 Empirical Power Analysis under Association of Two Classes 

Under the association alternatives, we consider three cases also. In each case, we generate Xi ~ U((0, 1) x 
(0,1)) for i = 1,2, ...,ni. Then we generate Yj associated with X's for j = 1,2, . . . ,n 2 as follows. For 

each j, select an i randomly, and set Yj = Xi + Rj (cosTj, sin 7})' where Rj ~ 14(0, r) with r G (0, 1) and 

Tj ~ U(0, 2 7r). We consider the following association alternatives: 

H I A :r=l/A, : r = 1/7, and H% r : r = 1/10. (31) 

Notice that association gets stronger as r decreases; that is, association gets stronger from to H^ 11 . 
Furthermore, by construction, the association of Y points with X points is stronger, compared to the associ- 
ation of X points with Y points. 

The empirical power estimates under association are presented in Figure 111! As expected as association 
gets stronger, the power estimates increase. However, there is no clear trend as total sample size n increases, 
since, by construction, association alternative depends on the sample size differences. For cell (1,1), type 
I and III cell-specific tests have higher power estimates, while for cell (2, 2) Dixon's test has higher power 
estimates for weak association and smaller sample sizes. 



8 Empirical Power Analysis in the Three- Class Case 

We also consider three cases for each of segregation and association alternatives in the three-class case. 

8.1 Empirical Power Analysis under Segregation of Three Classes 

Under the segregation alternatives, we generate Xi ~ U(S\), Yj ~ U(S2), and ~ U(S^,) for i = 1, . . . , m, 
j = 1, . . . , n 2 , and k = 1, . . . , n 3 where Si = (0, 1 - 2s) X (0, 1 - 2s), S 2 = (2s, 1) x (2s, 1), and S 3 = 



Empirical Size Plots for the Cell-Specific Tests under RL case (1) 




cell (2,1) cell (2,2) cell (2,3) 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 




Figure 7: The empirical size estimates of the cell-specific tests for cells (1, 1) — (3,3) under the RL case 1 
in the three-class case. The horizontal lines and legend labeling are as in Figure Q] and axis labeling are as in 
Figure 0J 



Empirical Size Plots for the Cell-Specific Tests under RL case (2) 



cell (1,1) cell (1,2) cell (1,3) 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 





Figure 8: The empirical size estimates of the cell-specific tests for cells (1, 1) — (3,3) under the RL case 2 
in the three-class case. The horizontal lines and legend labeling are as in Figure Q] and axis labeling are as in 
Figure 0J 



Empirical Size Plots for the Overall Tests under RL cases 1 and 2 
overall (3 classes, RL case 1) overall (3 classes, RL case 2) 




Figure 9: The empirical size estimates of the overall tests under the RL cases 1 and 2 in the three-class case. 
The horizontal lines and legend labeling are as in Figure [1] and axis labeling are as in Figure |U 

(s, 1 — s) x (s, 1 — s) with s 6 (0, 1/2). We consider the following segregation alternatives: 

H Sl : s = 1/12, H S2 : s = 1/8, and H Ss : s = 1/6. (32) 

Notice that, as s increases, segregation between the classes gets stronger; that is, segregation gets stronger 
from Hg 1 to Hg 3 . Furthermore, by construction classes X and Y are more segregated compared to Z and X 
or Z and Y. In fact, the segregation between X and Z and segregation between Y and Z are identical (as a 
stochastic process). 

Empirical power estimates for the diagonal cells (1,1), (2,2), and (3,3) under segregation alternatives 
are plotted in Figure [T2l and for the off-diagonal cells (1,2), (1,3), and (2,3) are plotted in Figure (T3] For 
diagonal cells (1, 1) and (2,2) type I and III tests have higher power, while for diagonal cell (3,3), all tests 
have similar power estimates. For the off-diagonal cells (1, 2) and (1, 3) all tests have similar power estimates, 
while for cell (2, 3) type I and III tests have higher power. In line with our simulation setup, power estimates 
for cells (1, 1) and (2, 2) arc higher compared to cell (3, 3), as classes X and Y are more segregated compared 
to class Z. For the same reason, power estimates for cell (1, 2) is higher compared to cells (1, 3) and (2, 3). 

Empirical power estimates for the overall tests are presented in Figure Q31 Dixon's and type II tests have 
very similar power estimates, while type III test has higher power than these tests. However, type I test is 
highly dependent on differences in relative abundances of the classes. In particular, for balanced samples, 
type I test has the lowest power estimates, while for unbalanced samples, type I test has the highest power 
estimates. 



8.2 Empirical Power Analysis under Association of Three Classes 

Under the association alternatives, we also consider three cases. We generate X t ~ W((0, 1) x (0, 1)) for 
i = 1, 2, . . . , n±. Then we generate Yj and Zj~ for j = 1, 2, . . . , n<i and k = 1, 2, . . . , ri3 as follows. For each 

j, select an i randomly, and set Yj := X, + Rj (cos Tj, sin Tj)' where Rj ~ U(0,r y ) with r y £ (0,1) and 

Tj ~ U(0,2ir). Similarly, for each k, select an i' randomly, and set Zk ■= Xi> + R^ (cosUe, smile)' where 

i?f ~ U(0, r z ) with r z € (0, 1) and Uk ~ W(0, 2 ir). We consider the following association alternatives: 

H Al : r y = 1/7, r z = 1/10, H A2 : r y = 1/10, r z = 1/20, H As : r y = 1/13, r z = 1/30. (33) 

As r y and r z decrease, the level of association increases. That is, the association between X and Y and 
association between X and Z get stronger from H Al to H Ai . By construction, classes Y and Z are associated 




Figure 10: The empirical power estimates for the cell-specific (left and middle columns) and the overall tests 
(right column) under the segregation alternatives in the two-class case. The horizontal axis labels and legend 
labeling are as in Figure [1] 



Empirical Power Estimates of the NNCT- Tests under H r A 



cell (1,1) 



cell (2,2) 



overall 





1 2 3 



5 6 7 



1 2 3 4 5 6 7 



Power Estimates under H 1 ^ 



cell (1,1) 



cell (2,2) 



overall 






Power Estimates under H 1 / 1 



cell (1,1) 



cell (2,2) 



overall 





S 




1 2 3 4 5 6 7 



Figure 11: The empirical power estimates for the NNCT-tests under the association alternatives in 
two-class case. The horizontal axis labels and legend labeling are as in Figure Q] 



Empirical Power Estimates of Cell-Specific Tests under Hs 1 



cell (1,1) 



cell (2,2) 



cell (3,3) 





a ° 
S 




2 4 6 8 10 12 



2 4 6 8 10 12 



Power Estimates under H$ 2 



cell (1,1) cell (2,2) cell (3,3) 




1 i i i i i i i i i i i ° i i i i i r 

2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



Power Estimates under Hs 3 



cell (1,1) 



cell (2,2) 



cell (3,3) 





a ° 
S 




2 4 6 



2 4 6 



10 12 



2 4 6 8 10 12 



Figure 12: The empirical power estimates of the cell-specific tests for cells (1,1), (2,2), and (3,3) under 
the segregation alternatives Hs 1 (top), Hs 2 (middle), and Hs 3 (bottom) in the three-class case. The legend 
labeling is as in Figure Q] and horizontal axis labels are as in Figure 01 



Empirical Power Estimates of Cell-Specific Tests under Hs 1 





Power Estimates under Hs 3 



cell (1,2) cell (1,3) cell (2,3) 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



Figure 13: The empirical power estimates of the cell-specific tests for cells (1,2), (1,3), and (2,3) under 
the segregation alternatives Hs 1 (top), Hs 2 (middle), and Hs 3 (bottom) in the three-class case. The legend 
labeling is as in Figure Q] and horizontal axis labels are as in Figure 01 



Empirical Power Estimates of Overall Tests under H$ 



overall overall overall 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



Figure 14: The empirical power estimates of the overall tests under the segregation alternatives Hs 1 (left), 
Hs 2 (middle), and Hs 3 (right) in the three-class case. The legend labeling is as in Figure Q] and horizontal 
axis labels are as in Figure 01 

with class X, while classes Y and Z are not associated, but perhaps mildly segregated for small r y and r z . 
Furthermore, by construction, classes X and Z are more associated compared to classes X and Y. 

The empirical power estimates for cells (1, 2) and (2, 1) are presented in Figure [T51 and for cells (1, 3) and 
(3, 1) are presented in Figure ITrjl For cells (1, 2) and (1, 3), type I and III cell-specific tests have higher power, 
while for cells (2, 1) and (3, 1), Dixon's cell-specific test has higher power. The power estimates for the overall 
tests are presented in Figure [T71 For the overall test, there is no clear winner in terms of power. For large 
samples, Dixon's test, type II and III have similar and larger power, while for smaller samples type IV tends 
to have higher power estimates. 

9 Empirical Size and Power Analysis for the One-vs-Rest Type 
Tests in the Three Class Case 

In one-versus-rest type testing, we implement Monte Carlo simulations as in Section loTTl to assess the empirical 
size performance of these tests under CSR independence. We present the empirical size estimates for various 
sample size combinations in Figure [T5] where only cell-specific tests for cell (2, 2) and the overall test are 
presented, since the cell-specific test for cell (1, 1) is the same as in the 3x3 NNCT analysis. Among cell- 
specific tests, types I and III tests perform better compared to Dixon's test, since they are closer to the 
nominal level for large samples. For the overall tests, except for type IV test, the tests are about the nominal 
level, but closer to the nominal level at different sample size combinations. 

To evaluate the power performance of these tests, we perform simulations under segregation alternatives 
as in Section 18.11 The empirical power estimates under the three segregation alternatives are presented in 
Figures [19] and [20] Among the tests, type I and III tests have higher power estimates compared to other 
types. One class-vs-rest tests for classes 1 and 2 have higher power estimates, compared to that of class 3. 
This occurs since by construction, classes 1 and 2 are equally segregated from other classes, and these classes 
are more segregated compared to class 3. 

For the association alternatives, we perform the simulations as in Section 18.21 The corresponding power 
estimates under the three association alternatives arc presented in Figures [21] and [22] Under the association 
alternatives, Dixon's cell-specific test and Dixon's and type IV overall tests have better power performance. 
Since, by construction, class 1-vs-rest statistics test the association of the other classes with class 1, they have 
higher power estimates. For classes 1 and 2, Dixon's one class-vs-rest tests have higher power, while for class 
3-vs-rest tests have similar power, although type I and III tests have slightly higher power estimates. 



Empirical Power Estimates of Cell-Specific Tests under Ha 



cell (1,2) 



cell (1,2) 



cell (1,2) 






2 4 6 



10 12 



cell (2,1) cell (2,1) cell (2,1) 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



Figure 15: The empirical power estimates of the cell-specific tests for cells (1,2) and (2,1) under the 
association alternatives Ha 1 (left), Ha 2 (middle), and Ha 3 (right) in the three-class case. The legend labeling 
is as in Figure [1] and horizontal axis labels are as in Figure [4] 



Empirical Power Estimates of Cell-Specific Tests under Ha 



cell (1,3) 



cell (1,3) 



cell (1,3) 





2 4 6 8 10 12 




cell (3,1) 



cell (3,1) 



cell (3,1) 






8 10 12 



Figure 16: The empirical power estimates of the cell-specific tests for cells (1,3) and (3,1) under the 
association alternatives Ha ± (left), Ha 2 (middle), and Ha 3 (right) in the three-class case. The legend labeling 
is as in Figure [1] and horizontal axis labels are as in Figure 0] 



Empirical Power Estimates of Overall Tests under Ha 



overall overall overall 




1 i i i i i i i i i i i ° i i i i i r 

2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



Figure 17: The empirical power estimates of the overall test under the association alternatives Ha x (left), 
Ha 2 (middle), and Ha 3 (right) in the three-class case. The legend labeling is as in Figure Q] and horizontal 
axis labels are as in Figure Q] 



Empirical Size Estimates of Cell-Specific Tests for cell (2, 2) under CSR 



class 1 vs rest 



class 2 vs rest 



class 3 vs rest 




2 4 6 8 10 12 





Empirical Size Estimates of Overall Tests under CSR 



overall (class 1 vs rest) 



overall (class 2 vs rest) 



overall (class 3 vs rest) 













D 

- - I 
• ■ • III 
IV 



2 4 6 8 10 12 




10 12 




10 12 



Figure 18: The empirical size estimates of the cell-specific tests for cell (2, 2) and overall tests under CSR 
independence with one-versus-rest type testing. The legend labeling is as in Figure [T] and horizontal axis 
labels are as in Figure |U 



Empirical Power Estimates of Cell-Specific Tests under Hs 1 



class 1 vs rest 



class 2 vs rest 



class 3 vs rest 





a ° 
S 




2 4 6 8 10 12 



2 4 6 8 10 12 



Power Estimates under H$ 2 



class 1 vs rest 



class 2 vs rest 



class 3 vs rest 




2 4 6 8 10 12 




m ° 
S 




2 4 6 8 10 12 



Power Estimates under Hs 3 



class 1 vs rest class 2 vs rest class 3 vs rest 




1 i i i i i i i i i i i ° i i i i i r 

2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



Figure 19: The empirical power estimates of the cell-specific tests for cell (2,2), under the segregation 
alternatives Hs 1 (top), Hs 2 (middle), and Hs 3 (bottom) in the three-class case with the one- versus- rest type 
testing. The legend labeling is as in Figure [1] and horizontal axis labels are as in Figure 0J 



Empirical Power Estimates of Overall Tests under Hg 1 



overall (class 1 vs rest) 



overall (class 2 vs rest) 



overall (class 3 vs rest) 





a ° 
5 



D 

■ • • III 
IV 



2 4 6 8 10 12 



Power Estimates under H$ 2 



overall (class 1 vs rest) 



overall (class 2 vs rest) 



overall (class 3 vs rest) 





m ° 
S 




2 4 6 8 10 12 



2 4 6 8 10 12 



Power Estimates under Hs 3 



overall (class 1 vs rest) 



overall (class 2 vs rest) 



overall (class 3 vs rest) 






Figure 20: The empirical power estimates of the overall tests under the segregation alternatives Hs 1 (top), 
Hs 2 (middle), and Hs 3 (bottom) in the three-class case with the one-versus-rest type testing. The legend 
labeling is as in Figure Q] and horizontal axis labels are as in Figure 0J 



Empirical Power Estimates of Cell-Specific Tests under Ha 1 



class 1 vs rest class 2 vs rest class 3 vs rest 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



Power Estimates under Ha 2 



class 1 vs rest 



class 2 vs rest 



class 3 vs rest 




10 12 





2 4 6 8 10 12 



Power Estimates under Ha 3 



class 1 vs rest 



class 2 vs rest 



class 3 vs rest 



1 ' ' ' 
, • ' ' 
1 * * • 




. ' * ' < 
\' '< 

- 


D 

- - 1 

• • • III 





2 4 6 8 10 12 



Figure 21: The empirical power estimates of the cell-specific tests for cell (2,2), under the association 
alternatives Ha 1 (top), Ha 2 (middle), and Ha 3 (bottom) in the three-class case with the one- versus- rest type 
testing. The legend labeling is as in Figure [1] and horizontal axis labels are as in Figure |U 



Empirical Power Estimates of Overall Tests under Ha ± 



overall (class 1 vs rest) overall (class 2 vs rest) overall (class 3 vs rest) 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



Power Estimates under Ha 2 



overall (class 1 vs rest) overall (class 2 vs rest) overall (class 3 vs rest) 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



Power Estimates under Ha 3 



overall (class 1 vs rest) overall (class 2 vs rest) overall (class 3 vs rest) 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



Figure 22: The empirical power estimates of the overall tests under the association alternatives Ha x (top), 
Ha 2 (middle), and Ha 3 (bottom) in the three-class case with the one-versus-rest type testing. The legend 
labeling is as in Figure Q] and horizontal axis labels are as in Figure 01 



Swamp Tree Data 



CD 
CO 

g 

1 

O 
O 
O 



O 



o 



o 

CO 



o 

CM 




x coordinate (m) 



Figure 23: The scatter plot of the locations of black gum trees (triangles A), Carolina ashes (pluses +), bald 
cypress trees (crosses x ) . 





NN 








B.G. C.A. 


B.C. 


sum 


B.G. 

base C.A. 
B.C. 


142 (69 %) 40 (26 %) 
34 (17 %) 98 (63 %) 
38 (19 %) 32 (21 %) 


23 (23 %) 
25 (26 %) 
28 (29 %) 


205 (45 %) 
156 (34 %) 
98 (21 %) 


sum 


214 (47 %) 170 (37 %) 


76 (17 %) 


460 (100 %) 



Table 2: The NNCT for swamp tree data and the corresponding percentages (in parentheses), where the cell 
percentages are with respect to the size of the NN species, and marginal percentages are with respect to the 
total size. B.G. = black gums, C.A. = Carolina ashes, and B.C. = bald cypresses. 



10 Example Data: Swamp Tree Data 



The N NCT methodology is illustr ated on an ecologica l data set: the swamp tree data of Good and Whipple! 



( 19821) which was also analyzed by iDixonl (|l994l l2002h . The data set is described in detail in I Cevh anl (|20 lOf ) . 



Briefly, the plot contains 13 different tree species, of which four species account for over 90 % of the 734 tree 
stems. In our analysis, we only consider black gums (Nyssa sylvatica), Carolina ashes (Fraxinus caroliniana) , 
and bald cypresses ( Taxodium distichum) (as if only these three tree species exist in the area, so we are 
ignoring the possible effects of other species on the spatial interaction between these species for illustrative 
purposes). Thus, we perform a 3 x 3 NNCT-analysis on this data set. See Figure |2"31 for the location of the 
trees in this plot and Tabic [2] for the associated 3x3 NNCT together with cell percentages based on NN class 
sizes, and marginal percentages based on the grand sum, n. When, e.g., black gum is the base species and 
Carolina ash is the NN species, the cell count is 40 which is 26 % of the 156 Carolina ashes (and Carolina 
ashes are 34 % of all trees) . The percentages in Tabic [2] and the Figure [23] suggest that each tree species is 
segregated from the other trees as the observed percentages of species in the diagonal cells are much larger 
than the row percentages (or species percentages). 



The null model in a NNCT analysis depends on the particular ecological context. iGoreaud and Pelissier 



(|2003l ) state that under CSR independence, the two classes are a priori the result of different processes 
(e.g., individuals of different species or age cohorts). On the other hand, under RL, some processes affect 
a posteriori the individuals of a single population (e.g., diseased vs. non-diseased individuals of a single 
species). Hence, in the swamp tree data, the locations of the tree species can be viewed a priori resulting from 
different processes, so the more appropriate null hypothesis is the CSR independence pattern. We compute 



Overall tests 





C D 


Ci 


Cu 


Cm 


Civ 




76.92 


66.06 


76.71 


66.11 


78.82 


Pasy 


< .0001 


< .0001 


< .0001 


< .0001 


< .0001 


Pmc 


< .0001 


< .0001 


< .0001 


< .0001 


< .0001 


Prand 


< .0001 


< .0001 


< .0001 


< .0001 


< .0001 



Table 3: Test statistics and p- values for the overall tests and the corresponding p-valucs. p asy , p mc , and p ran d 
stand for the p-values based on the asymptotic approximation, Monte Carlo simulation, and randomization 
of the tests, respectively. Cd stands for Dixon's overall test and Cj_jy are for types I-IV overall tests. 



Q = 282 and R = 288 for this data set and our inference will be conditional on these values. Dixon's and the 
new overall segregation tests and the associated p- values are presented in Table El where p asy stands for the 
p-value based on the asymptotic approximation, p mc is the p-value based on 10000 Monte Carlo replication 
of the CSR independence pattern in the same plot and p ran d is based on Monte Carlo randomization of the 
labels on the given locations of the trees 10000 times. Notice that p asy , Pmc, and p ran d are all significant. The 
cell-specific test statistics and the associated p- values are presented in Table [4] where p- values are calculated 
as in Table [3 Again, all three p- values in Tabled] are similar for each cell-specific test. 

The overall segregation tests are all highly significant which implies that there is significant deviation 
from the CSR independence pattern for at least one of the tree species. To determine which species exhibit 
segregation or association, we perform the cell-specific tests as a post-hoc analysis. At 0.05 level, Dixon's and 
the new cell-specific tests agree for all cells in term of significance except for (B.C., B.C.) cell, at which Dixon's 
test is not significant but types I and III arc significant. At 0.10 level tests agree for cells except (B.C.,B.G) 
and (C.A.,B.C), at cell (B.C.,B.G) Dixon's test is not significant but types I and III are significant, while at 
cell (C.A.,B.C.) Dixon's test is significant but types I and III arc not. At 0.01 level tests agree at cells except 
for cell (B.G.,B.C.) at which Dixon's test is significant while types I and III are not. The test statistics are 
all positive (negative) for the diagonal (off-diagonal) cells which also support the segregation of species. 



For a given class i, we estimate probabilities 7Ty of Scction [2.1l as 7?^ = Nij/rij. The estimated probabilities 
are presented in parentheses as percentages in Table [2] For example, for (B.G.,C.A.) cell, 7F12 = N12/112 = 
40/156 w 0.26. For black gums, we have ttu = 0.69 > 0.26 + 0.23, so black gums exhibit total segregation 
from the other two tree species. Similarly, for California ashes, we have 7722 = 0.62 > 0.17 + 0.26, so Carolina 
ashes exhibit total segregation from the other two tree species. However, bald cypresses only exhibit strong 
segregation, since 7T33 = 0.29 > 0.19 + 0.21, but 0.29 > 0.19 and 0.29 > 0.21. 

We also present the one- vs- rest cell-specific and overall tests (see Table [5]). For each species, we observe 
that the other species combined tend to be segregated from the species in consideration, but to a lesser extent 
for bald cypresses. 

The spatial interaction is significant for each species, but at different levels. In particular, black gums 
exhibit significant segregation from other species (they are significantly segregated from both Carolina ashes 
and bald cypresses), Carolina ashes exhibit significant segregation from other species (they are significantly 
segregated from black gums but not from bald cypresses), and Bald cypresses exhibit significant segregation 
from other species (they are moderately segregated from black gums only but when the two species of black 
gums and Carolina ashes are considered together, the (B.C., B.C.) cell is significant). 

However, these results pertain to interaction at about the average NN distances. For the swamp tree 
data average NN distance (± standard deviation) is about 2.1 (± 1.35) meters. We might also be interested 
in the possible causes of the segregation and the type and level of interaction between the tree species at 
different distances between the trees. Along thi s line, we also present the second-order analysis of the swamp 
tree data by the pair correlation function g(t) ( Stovan and Stovan ( 1994 )). The pair correlation function of 



a (univariate) stationary point process is defined as g(t) ~ ^ ^ where K'(t) is the derivative of Ripley's 
K(t) function. For a univariate stationary Poisson process, g(t) = 1; values of g(t) > 1 suggest clustering (or 
aggregation) and the values of g(t) < 1 suggest inhibition (or regularity) between points. The pair correlation 
functions for each species are plotted in Figure [Ml Black gums are aggregated for distance values of about 
1-6 and 9-11 m; Carolina ashes are aggregated for all the range of the plotted distances; and bald cypresses 
are aggregated for distance values of about 2-8 and around 11 m. These distance ranges at which species 



Dixon's cell-specific tests 





B.G. 


C.A. 


B.C. 


B.G. 


6.60 (< .0001, < .0001, < .0001,) 


-4.50 (< .0001, < .0001, < .0001,) 


-3.72 (.0002, < .0001, .0005) 


C.A. 


-5.69 (< .0001, < .0001, < .0001) 


6.64 (< .0001, < .0001, < .0001) 


-1.72 (.0846, .0855, .0846) 


B.C. 


-1.16 (.2469, .2773, .2406) 


-0.33 (.7445, .7539, .7284) 


1.52 (.1292, .1228, .1402) 



Type I cell-specific tests 





B.G. 


C.A. 


B.C. 


B.G. 


6.94 (< .0001, < .0001, < .0001) 


-6.33 (< .0001, < .0001, < .0001) 


-2.36 (.0185, .0176, .0189) 


C.A. 


-6.90 (< .0001, < .0001, < .0001) 


6.54 (< .0001, < .0001, < .0001) 


-0.23 (.8158, .8153, .8137) 


B.C. 


-1.65 (.0986, .1000, .0933) 


-0.99 (.3234, .3251, .3285) 


2.62 (.0089, .0085, .0082) 



Type III cell-specific tests 





B.G. 


C.A. 


B.C. 


B.G. 


6.94 (< .0001, < .0001, < .0001) 


-6.33 (< .0001, < .0001, < .0001) 


-2.35 (.0187, .0179, .0192) 


C.A. 


-6.90 (< .0001, < .0001, < .0001) 


6.54 (< .0001, < .0001, < .0001) 


-0.23 (.8187, .8225, .8174) 


B.C. 


-1.65 (.0984, .0999, .0933) 


-0.99 (.3228, .3250, .3283) 


2.61 (.0091, .0087, .0083) 



Table 4: Test statistics and p- values for the cell-specific tests and the corresponding p- values (in parentheses) . 
The p-values are given in the order of Pasy, Pmc, and p ra nd, whose labeling is as in Table [3l B.G. = black 
gums, C.A. = Carolina ashes, and B.C. = bald cypresses. 



Black Gums Carolina Ashes Bald Cypresses 




02468 10 12 02468 10 12 02468 10 12 



t(m) t(m) t(m) 

Figure 24: Pair correlation functions for each species in the swamp tree data. Wide dashed lines around 
1 (which is the theoretical value) are the upper and lower (pointwise) 95 % confidence bounds for the pair 
correlation functions based on Monte Carlo simulation under the CSR independence pattern. 



One-vs-rest Cell-specific Tests 





^22 


^22 


ylll 
^22 


B.G.-vs-rest 


5.11 (< .0001) 


6.94 (< .0001) 


6.93 (< .0001) 


C.A.-vs-rest 


3.91 (< .0001) 


6.54 (< .0001) 


6.54 (< .0001) 


B.C.-vs-rest 


4.13 (< .0001) 


2.62 (.0044) 


2.62 (.0044) 



One-vs-rest Overall Tests 





c D 


Ci 


C n 


Cm 


Civ 


B.G.-vs-rest 


49.27 


48.09 


49.27 


48.09 


49.27 




< .0001 


< .0001 


< .0001 


< .0001 


< .0001 


C.A.-vs-rcst 


44.84 


42.74 


44.84 


42.78 


45.44 




< .0001 


< .0001 


< .0001 


< .0001 


< .0001 


B.C.-vs-rest 


17.05 


6.84 


17.04 


6.90 


17.03 




.0002 


.0089 


.0002 


.0086 


.0002 



Table 5: Test statistics and p- values for one-vs-rest cell-specific tests for cell (2, 2) (the corresponding p- values 
are presented in parenthesis) and one-vs-rest overall tests (the corresponding p- values are presented below the 
test statistics). Z^i stands for Dixon's cell-specific test, and an d 1 stand for type I and III cell-specific 
tests. Cd stands for Dixon's overall test and Cj_/y are for types I-IV overall tests. 




Figure 25: Pair correlation functions for each pair of species in the swamp tree data. Wide dashed lines 
around 1 (which is the theoretical value) are the upper and lower (pointwise) 95 % confidence bounds for the 
pair correlation functions based on Monte Carlo simulations under the CSR independence pattern. B.C. = 
black gums, C.A. = Carolina ashes, and B.C. = bald cypresses. 



are aggregated include the mean NN distance for our data, hence this aggregation could be the reason of the 
significant segregation between the species. 

The same definition of the pair correlation function can be applied to Ripley's bivariate K or L-functions. 
Under CSR independence we have g(t) = 1; g(t) > 1 suggests association of the classes; and g(t) < 1 suggests 
segregation of the classes. The bivariate pair correlation functions for the species in swamp tree data are 
plotted in Figure [25] Black gums and Carolina ashes are segregated for about 2-2.5, 3.5-4.5, 7.5-8.5, and 
10.5-12 m; black gums and bald cypresses are segregated for about 2.5, 3, and 6 m; and Carolina ashes and 
bald cypresses are associated for 7 and 9 m. 

The pair correlation function estimates have considerabl y high variability for small t if g(t) > 0, hence 
not so reliable for small distances ( Stovan and Stovanl ( 19961 )). See for example Figures l24l and l25l where the 



confidence bands for small t values are much wider compared to those for larger t values. So pair correlation 
function analysis is more reliable for larger distances, say, larger than about the average NN distance in the 
data set. While the pair correlation function provides information on the univariate and bivariate patterns 
at all distances, NNCT-tests summarize the spatial interaction for distances about the average NN distance 
in the data set. 



11 Discussion and Conclusions 



We introduce cell-specific and overall segregation tests based on nearest neighbor contingency tables (NNCTs). 
NNCT-tests are used in testing randomness in the nearest neighbor (NN) structure between two or more 
classes. The overall test is used for testing any deviation from randomness in all the NNCT cells combined; 
cell-specific test for cell is used for testing any deviation from randomness in cell i.e., NN structure 

in which base class is i and NN class is j. This statistic tests the segregation or lack of it, if i = j; the 
association or lack of it between classes i and j, if i ^ j. The randomness in the NN structure is implied by 
the RL or CSR independence patterns. We demonstrate that under the CSR independence pattern, NNCT- 
tests are conditional on Q and R, while under the RL pattern, these tests are unconditional. In the two-class 
case, cell-specific tests are essentially different only for two cells, since cell (1,1) and (1,2) yield the same 
test statistic in absolute value for Dixon's cell-specific test, likewise for cells (2,1) and (2,2). Similarly, cell 
(1, 1) and (2, 1) yield the same test statistic in absolute value for the type III cell-specific test, likewise for 
cells (1,2) and (2,2). For type I cell-specific test cells (1,1) and (2,2) yield the same test statistic, and the 
off-diagonal cells give the negative of this value. 

We demonstrate that the cell-specific tests tend to standard normal distribution, as the sample size gets 
larger. On the other hand, the overall tests tend to chi-square distribution with the corresponding degrees of 
freedom with the increasing sample size. Two major types of asy mptoti c struc tures for spatial data exist in 
literature: infill asymptotics and increasing domain asymptotics (jLahiri (11996^ . In "infill asymptotics" the 



region of interest is a fixed bounded region and the number of observed points gets larger in this region. Hence 
the minimum distance between data points tends to zero as the sample size tends to infinity. In "increasing 
domain asymptotics", any two observations are required to be at least a fixed distance apart, hence as the 
number of obs ervations increase, the region on which the process is observed eventually becomes unbounded 
(ICressiel(<1993h l. The sampling structure in our asymptotic sampling distribution could be either one of these 



asymptotic structures. Because we only consider the class sizes and hence the total sample size tending to 
infinity regardless of the size of the study region. 

Based on our Monte Carlo simulations, we observe that the asymptotic approximation for the cell-spccific- 
tests is appropriate only when the corresponding cell count in the NNCT is larger than 10; and for the overall 
test when all cell counts are at least 5. For NNCTs with smaller cell counts, we recommend the Monte Carlo 
randomization of the tests. In the two-class case, types I and III cell-specific tests have better empirical size 
performance for the cell corresponding to the smaller sample, while Dixon's cell-specific test has better size 
performance for the cell corresponding to the larger sample. For the overall test, the performance of the cell- 
specific tests are similar for Dixon's and types I and III tests, all of which have better performance compared 
to type IV test. In the three class case, types I and III cell-specific tests have better size performance, 
and type III overall test have better size performance. We also observe that types I and III cell-specific 
tests and type III overall test are more robust to the differences in sample sizes (i.e., differences in relative 
abundance). Under the segregation alternatives, in the two-class case, types I and III cell-specific tests have 
similar power estimates which are larger than those of Dixon's, and for the overall tests, types I and III overall 
tests have higher power estimates for large sample size combinations. In the three class case, types I and 
III cell-specific tests have higher power estimates, while for the overall tests, type III test has higher power 
estimates. Under the association alternatives, in the two-class case, types I and III cell-specific tests (Dixon's 
cell-specific tests) have higher power estimates for the class with the smaller (larger) sample size, and for the 
overall tests, types I and III overall tests have higher power estimates for large sample size combinations. In 
the three class case, types I and III cell-specific tests have higher power estimates for cell (i,j) if ni is less 
than nj, while Dixon's cell-specific tests have higher power estimates if ni is larger than nj. For the overall 
tests, Dixon's overall test has the highest power estimates. When empirical size and power performances 
are considered together, among cell-specific tests, types I and III cell-specific tests are recommended against 
the segregation alternatives, while types I, III, and Dixon's cell-specific test are recommended against the 
association alternatives. Among overall tests, type III overall tests are recommended against the segregation 
alternatives, while Dixon's overall test is recommended against the association alternatives. 

NNCT-tests summarize the pattern in the data set for small scales around the average NN distance 
between all points. On the other hand, pa ir correlation function g{t) and Ripley's classical K or L-functions 
and other variants ( Baddeley et al. ( 2000h ) provide information on the pattern at various scales (i.e., around 



other distance values). Hence NNCT-tests and pair correlation or if -functions are not comparable but provide 
complimentary information about the pattern in question. However, an advantage of overall NNCT-tests is 



that they provide the interaction in a multi-class setting in the presence of all classes, while the second order 
analysis with K or g functions allow a comparison of pairs of classes (one at a time). Furthermore, when an 
overall NNCT-tcst is significant, it offers various post- hoc tests to follow up the specifics of the interaction: 
(i) cell-specific tests, (ii) one-class-vs-rest type tests, and (iii) class-specific tests. In the cell-specific tests for 
cell (i,j), the interaction between classes i and j are examined in the presence of all other classes, and in the 
class i-vs-rest testing, the interaction of all the classes other than class i with class % is investigated. The pair 
correlation function and if-functions can also be adapted for one-vs-rest type analysis, as classes i and the 
rest of the classes can be treated as the two classes we use our analysis. 

The course of action we recommend depends on which null hypothesis is more appropriate. If CSR 
independence is the reasonable null pattern, we recommend the overall segregation test to detect the spatial 
interaction at small scales at about the mean NN distance. If it yields a significant result, then to determine 
which pairs of classes have significant spatial interaction, the cell-specific tests or one-vs-rest type tests can 
be performed (we recommend both versions as they provide information on different aspects of the spatial 
interaction). To detect spat ial interaction at larger distances, pair correlation function is recommended 
( Stovan and Penttinen ( 20001 )). due to the cumulative nature of Ripley's K- or L-functions for larger distances. 
On the other hand, if the RL pattern is the reasonable null pattern, we rec ommend the NNCT-tests to detect 
the interaction at abou t the mean NN distanc e, and Digglc's £>-function (IDiggld (|2003[ )) or modified version 
of Ripley's K function (IBaddelev et al. to detect the interaction at higher distances. 



Acknowledgments 

Most of the Monte Carlo simulations presented in this article were executed at Koc University High Perfor- 
mance Computing Laboratory. 



References 

Baddcley, A., M0llcr, J., and Waagepetersen, R. (2000). Non- and semi-parametric estimation of interaction 
in inhomogencous point patterns. Statistica Neerlandica, 54(3):329-350. 

Ceyhan, E. (2008). On the use of nearest neighbor contingency tables for testing spatial segregation. Envi- 
ronmental and Ecological Statistics. doi:10.1007/sl0651-008-0104~x. 

Ceyhan, E. (2009). Class-specific tests of segregation based on nearest neighbor contingency tables. Statistica 
Neerlandica, 63(2):149-182. 

Ceyhan, E. (2010). New tests of spatial segregation based on nearest neighbor contingency tables. Scandina- 
vian Journal of Statistics, 37:147165. 

Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. 

Cuzick, J. and Edwards, R. (1990). Spatial clustering for inhomogencous populations (with discussion). 
Journal of the Royal Statistical Society, Series B, 52:73-104. 

Diggle, P., Zheng, P., and Durr, P. (2005). Nonparametric estimation of spatial segregation in a multi- 
variate point process: bovine tuberculosis in Cornwall, UK. Proceedings of National Academy Sciences, 
54(3):645658. 

Diggle, P. J. (2003). Statistical Analysis of Spatial Point Patterns. Hoddcr Arnold Publishers, London. 

Dixon, P. M. (1994). Testing spatial segregation using a nearest-neighbor contingency table. Ecology, 
75(7):1940-1948. 

Dixon, P. M. (2002). Nearest- neighbor contingency table analysis of spatial segregation for several species. 
Ecoscience, 9(2):142-151. 

Fossett, M. (2011). Generative models of segregation: Investigating model-generated patterns of residential 
segregation by ethnicity and socioeconomic status. The Journal of Mathematical Sociology, 35(1-3):114-145. 



Good, B. J. and Whipple, S. A. (1982). Tree spatial patterns: South Carolina bottomland and swamp forests. 
Bulletin of the Torrey Botanical Club, 109:529-536. 

Goreaud, F. and Pclissicr, R. (2003). Avoiding misinterpretation of biotic interactions with the intertype 
-RTi2-function: population independence vs. random labelling hypotheses. Journal of Vegetation Science, 
14(5):681-692. 

Hamill, D. M. and Wright, S. J. (1986). Testing the dispersion of juveniles relative to adults: A new analytical 
method. Ecology, 67(2):952-957. 

Henry, A. D., Pralat, P., and Zhang, C. (2011). Emergence of segregation in evolving social networks. 
Proceedings of National Academy Sciences, 108(21):86058610. 

Kulldorff, M. (2006). Tests for spatial randomness adjusted for an inhomogcncity: A general framework. 
Journal of the American Statistical Association, 101(475):1289-1305. 

Lahiri, S. N. (1996). On consistency of estimators based on spatial data under infill asymptotics. Sankhya: 
The Indian Journal of Statistics, Series A, 58(3):403-417. 

Piclou, E. C. (1961). Segregation and symmetry in two-species populations as studied by nearest-neighbor 
relationships. Journal of Ecology, 49(2):255-269. 

Ripley, B. D. (2004). Spatial Statistics. Wilcy-Interscience, New York. 

Robertson, S. L. and Cushing, J. M. (2011). Spatial segregation in stage-structured populations with an 
application to Tribolium. Journal of Biological Dynamics, 5(5):398-409. 

Searle, S. R. (2006). Matrix Algebra Useful for Statistics. Wilcy-Intersciences. 

Stoyan, D. and Penttinen, A. (2000). Recent applications of point process methods in forestry statistics. 
Statistical Science, 15(1):61— 78. 

Stoyan, D. and Stoyan, H. (1994). Fractals, random shapes and point fields: methods of geometrical statistics. 
John Wiley and Sons, New York. 

Stoyan, D. and Stoyan, H. (1996). Estimating pair correlation functions of planar cluster processes. Biomet- 
rical Journal, 38(3):259-271. 

van Lieshout, M. N. M. and Baddclcy, A. J. (1999). Indices of dependence between types in multivariate 
point patterns. Scandinavian Journal of Statistics, 26:511-532. 

Whipple, S. A. (1980). Population dispersion patterns of trees in a Southern Louisiana hardwood forest. 
Bulletin of the Torrey Botanical Club, 107:71-76. 



